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1. Introduction 


Magnetoencephalography (MEG) can be considered as one of the most powerful instruments 
for non-invasive investigation of the brain functions (Hämäläinen et al., 1993). Its good 
temporal resolution (sampling frequency can reach several thousands of Hertz), only 
comparable to the sampling frequency of electroencephalography (EEG), allows following the 
neural dynamics ona millisecond time scale. This implies that, with respect to other functional 
imaging modalities, MEG provides an enormous amount of information about the brain 
activity. On the other hand, localization of brain activity from MEG data requires to solve 
a highly ill-posed inverse problem (Sarvas, 1987): exact reconstruction of the neural current 
is not possible even with noise-free data, and the debate about the actual spatial resolution of 
MEG is still open. Despite this drawback, the appeal of MEG data is such that a large number 
of methods have been developed for localizing brain activity from MEG recordings. Starting 
from the middle Eighties, when the first Minimum Norm Estimate (Hämäläinen & IImoniemi, 
1984) appeared, to the recent developments in spatio temporal regularization (Ou et al., 2009), 
state space models (Long et al., 2006; Sorrentino et al., 2009) and Markov Chain Monte Carlo 
(Jun et al., 2005), the attempts to insert enough a priori information to actually provide stable 
and reliable solutions have been rising rather than diminishing. The goal and the scope 
are clear: automatic and reliable source localization can be of practical use in several fields, 
possibly including clinical applications (pre-surgical evaluation, foci localization in epilepsy), 
new technology development (e.g. brain computer interfaces), and scientific research (e.g. for 
analysis of large datasets). 

Despite having been presented in rather different frameworks and with largely different 
names and techniques, most methods appeared so far in the MEG inverse literature can be 
described rather straightforwardly in a statistical setting, which helps unifying and clarifying 
the assumptions and the limitations of the methods. In this chapter, we attempt a brief 
review of the most largely known methods, as well as of some of the most recently appeared 
ones, highlighting the statistical features. We first provide a brief description of the statistical 
approach to inverse problems in Section 2; then we describe the main features of the MEG 
inverse problem in Section 3; in Section 4 and 5 we review methods developed by other 
authors, while in Section 6 we discuss our work on Bayesian filtering for MEG. 
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2. Statistics and inverse problems 


2.1 Inverse problems 

In applied mathematics the term inverse problem denotes a class of problems, often 
originating from physics equations, where the unknown is an unobserved quantity which 
has to be recovered from indirect measurements. Rather often, there is a causal relationship 
between the unknown and the data, and the inverse problem involves moving backwards 
from a measured effect to an unknown cause. An excellent source of inverse problems is the 
field of medical imaging, where one tries to recover information about quantities within the 
body from external measurements. 

Inverse problems are usually represented in functional form as 


g=Alf) (1) 


where g € Hp, f € H), and A : Hı — H; is an operator between the two possibly distinct 
functional spaces (e.g. Banach spaces, Hilbert spaces, etc.) Hı and H2. From a mathematical 
viewpoint, the inverse problem consists in finding the function f given the function g and the 
operator A. In most cases of practical interest, however, the solution of the inverse problem 
cannot be found exactly, because the operator A is pathological: either its kernel is non-empty 
- leading to non-uniqueness of the inverse solution - or the range of A is not the whole Hp - 
leading to non-existence problems - or the inverse operator, when it exists, is not continuous - 
leading to stability problems. In applied mathematics, the typical solution to these pathologies 
is provided in the celebrated regularization framework for linear inverse problems (Bertero 
& Boccacci, 1998), where the ill-posed problem is replaced by a one-parameter family of 
well-posed ones. The most common regularization algorithm, Tichonov method, consists in 
finding the minimizer f of 


lg — A> fllfty + AIFF (2) 
which is rather straightforwardly 


f=(A*A+AI) 1 A*g (3) 


where A* is the adjoint of A and å is the regularization parameter, which tunes the balance 
between stability of the solution and its capability to explain the measured data. More in 
general, the recipe of any regularization algorithm is to minimize a functional which combines 
the misfit - the difference between the actual data and the model - with some type of norm of 
the solution - implicitely encouraging some stability property for the solution. The choice of 
this norm, together with a criterion for choosing the value of the regularization parameter, are 
key ingredients which highly depend on the specific problem. 


2.2 Statistics for inverse problems 

Given the stochastic nature of noise affecting the data, it is natural to view inverse problems 
as statistical inference problems. The core of this approach is to model the data as a 
Random Variable (RV). Then, there exist two main frameworks for dealing with the inverse 
problem: in the Maximum Likelihood (ML) framework, the unknown is viewed as a set 
of parameters which characterize the probability distribution of the data, and the inverse 
problem is re-phrased as a parameter estimation problem; in the Bayesian framework, the 
unknown is viewed as a Random quantity as well, and the inverse problem is re-phrased as 
the problem of calculating the posterior distribution of the unknown, given the data. 
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More specifically, let y be the realization of the Random Variable Y, representing the measured 
data. In the ML framework, x represents the unknown parameters; the distribution of the 
data L(y|x) is called likelihood function and contains both the physical model - the operator 
A mapping the unknown onto the data - and the assumptions on the statistical distribution 
of noise. Typically one wants to find the x which maximizes L for the observed noisy data y; 
such 

£ = arg max(L(y|x)) (4) 


is naturally called Maximum Likelihood estimate of x. 
In the Bayesian framework, on the other hand, x is viewed as the realization of the Random 
Variable X rather than as a parameter, and it is given a prior distribution; this prior is then 
combined with the likelihood through Bayes theorem to produce the so called posterior 
distribution, i.e. the probability distribution for the unknown, conditioned on the realization 
of the data. The solution of the inverse problem is here defined to be the posterior distribution 
of the unknown, which can then be used to compute point estimates and bounds. More 
specifically, let p(x) be the prior probability distribution for the unknown, emboding all the 
information which is known a priori, before the data are collected; let L(y|x) be the likelihood 
function, the same function which is used in the ML framework; the posterior distribution is 
given by Bayes theorem as 
L(y|x) p(x) 

ply) 
where p(y) is the normalizing constant and is given as p(y) = f L(y|x)p(x)dx. 


p(xly) = (5) 


2.3 The linear Gaussian case 
Let us briefly consider a linear inverse problem with additive zero-mean Gaussian noise N, 
with standard deviation g; in the ML framework, the model equation is 


Y=Ax+N (6) 


and the likelihood function is 


1 lly — Ax| a 
L(y|x) = —— exp | -——, — ]. 7 
(lx) = a exp ( -15 ”) 
The ML estimate in this example is the solution of the least squares problem. 
In Bayesian terms, the model equation becomes 
Y= AK +N (8) 


and the posterior distribution under zero-mean Gaussian prior for the unknown turns out to 


be: 
lly — Ax]? Ix]? 
p(x|y) x exp et Daye (9) 


where y is the standard deviation of the prior. Comparison of equations (9) and (2) suggests 
that the minimizer of the Tichonov functional is the mode of the posterior distribution, 
provided that the regularization parameter is identified with the ratio between the noise 
variance and the variance of the prior distribution. 


www.intechopen.com 


96 Magnetoencephalography 


As this simple example suggests, in the statistical framework regularization is interpreted 
as providing a priori information to solve the ill-posed problem. Furthermor, the statistical 
framework is an elegant and unifying framework where linear as well as non-linear problems 
can be formally described. Rather than providing computational recipes, this framework 
provides a theoretical background which is general enough for different kinds of problems, for 
different kinds of distributions of the noise, and always provides a meaningful interpretation 
of the variables and parameters in the game. 


3. Basic facts about the MEG inverse problem 


3.1 Ill posedeness 
The magnetic field b produced by a given electrical current distribution j(r) inside a volume 
V and measured at a point r ¢ V is given by the Biot-Savart equation 


r-r ,, 
=f f ie) x yee (10) 


It has been known for 150 years that the problem of estimating the electrical current from 
measurements of the magnetic field is an ill posed problem. Recently a few rigorous proofs 
have appeared in the literature, which are summarized in the following. 


Let B1 : (C(V))* > (C(D))? be the operator described by (10), mapping the electrical current 
distribution, belonging to the space (C(V))° of continuous function within the brain volume 
V, onto the magnetic field, belonging to the space (C(D))? of continuous functions on a 
surface D which contains the boundary of V. In Kress et al. (2002) it has been proved that 
the kernel of 6; contains the linear subspace 


M = {j|j = Am,m € (Co(V))°}. (11) 


This implies that the inverse problem has no unique solution; in particular, given a solution 
j* of the inverse problem (10), any current distribution of the form j = j* + Aj’, with j’ € M 
and A € R is also a solution of the problem. 


Let Bp : (L?(V))> — (L?(V))° be the operator described by (10), mapping the electrical 
current distribution in the volume V onto the magnetic field within the same volume V. In this 
case, all three components of both the electrical current and the magnetic field are assumed to 
be L? functions. In Cantarella et al. (2001), it has been proved that Bz is a compact operator. 
This result implies that the generalized inverse is not a continuous operator. 


3.2 The inverse model 

The Biot Savart equation (10) is only a partial description of the MEG inverse problem: there 
are two main issues which need to be discussed in order to provide computational solutions 
to the inverse problem. The first issue, only mentioned here because it is not relevant in 
the rest of the Chapter, concerns the geometrical and physical model used to approximate 
the head; it is usually referred to as the forward problem, because it amounts to accurate 
calculations of the magnetic field produced by a known neural current distribution. Due to 
the non-zero conductivity of the tissues inside the head, the electrical currents elicited by 
brain activity are only a portion of the electrical currents flowing in the brain; this means that 
j(r) in (10) contains both neural activity and passively flowing currents, which are generated 
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by the neural activity itself and contribute to the measured field as well. The forward problem 
accounts for the mechanisms by which the neural currents generate the passive currents, 
and how these passive currents flow in the brain. In the simplest model, available since 
the Eighties, the head is modelled as a homogeneous spherical conductor, and the effect of 
the passive currents can be calculated analytically (Sarvas, 1987). A more realistic model, 
nowadays largely used, takes into account the actual shape of the brain, while maintaining a 
piecewise homogeneous conductivity, and is solved with numerical integration techniques 
known as Boundary Element Methods (Gramfort et al., 2011). More recently the use of Finite 
Element Methods (Pursiainen et al., 2011), far more computationally demanding, allows 
implementation of completely non-homogeneous and even anisotropic conductivity models. 


The second relevant point is the discretization of equation (10). The standard approach 
comprises discretization of the brain volume with a finite set of N points points, possibly 
distributed along the cortical surface. The neural current is then represented as a vector 
field on these support points. The single current element, with support in a single point of 
the brain, is usually referred to as current dipole. It is characterized by a location r and a 
dipole moment q; mathematically speaking it is an applied vector, while from the physical 
viewpoint it represents the first order approximation of any current distribution. The whole 
neural current is then viewed as the superposition of thousands of current dipoles: 


N points k 
jr) = L qae- r") (12) 
k=1 


where qk is the dipole moment of the current dipole at location r and 6(-) is the Dirac 
delta. The Biot-Savart operator is discretized in a matrix G = [G(r'),...,G(r¥mns)] of 
size Nsensors X 3N, points Where each column of the matrix G(rk ) represents the magnetic field 
produced by a unit current dipole at a given position and oriented according to one of the 
three orthogonal directions; this matrix is usually referred to as the leadfield matrix. The 
discretized mathematical model becomes here 


b=G.j (13) 


where the vector b is Neensors X 1 and contains the collection of data measured by all the 
sensors, while the vector j = [j eet jh points |T ig 3M, points X 1 and contains the dipole moments 
of the current dipoles located at the fixed locations contained in the leadfield G. In the 
remainder of the Chapter we will denote by b; and j, the measured data and the neural current 


at time t; the leadfield matrix is usually assumed to be not time dependent. 


3.3 Classification of inverse methods 

Equation (13) is the main equation of practical interest for the solution of the inverse problem. 
It reflects the fact that the inverse problem is linear, in the sense that the measured data are 
the linear superpositions of the contributions of individual current dipoles. The physical and 
geometrical model discussed in the previous section is embodied in the matrix G, which is 
assumed to be known when solving the inverse problem. However, talking about the MEG 
inverse problem may be misleading for the following reason. While in the general case one 
will model the neural current as a vector field with support in the whole brain, and the inverse 
problem amounts to inversion of the (non-invertible) matrix G, in an alternative approach 
one can assume that the brain activity is highly localized in few brain regions; if these regions 
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are small compared to their distance from the sensors, each region can be approximated as 
a single current dipole; in this case, the inverse problem consists in estimating the number 
of sources, their location, orientation and strength. The mathematical properties of the two 
inverse problems are rather different. In the former case, one will face a linear equation 
resulting in a highly under-determined problem due to the ratio between the typical number 
of sensors (~ 100) and the typical number of points (~ 10,000). In the following, we will 
call imaging methods the methods based on a continuous current distribution model. In the 
second case, one will want to estimate the intensity, the orientation, as well as the number and 
the locations of the point sources; the dependence of the magnetic field on the two latter 
parameters is highly non-linear, thus resulting in a genuinely non-linear problem. In the 
following, we will call parametric methods the methods based on a point source model. 

The classification of inverse methods in imaging and parametric methods is rather common 
practice in many reviews and dates back at least to Hämäläinen et al. (1993), where dipole 
methods are separated from regularization methods. In this chapter, we introduce a new - to 
the best of our knowledge - classification of inverse methods, which accounts for a recent trend 
in the MEG inverse problem community. We will call static methods the inverse methods which 
do not give a priori information on the temporal behaviour of the neural sources, and dynamic 
methods those which do provide a prior on it. The recent appearance of dynamic methods is 
indeed a consequence of the increased availability of cheap computational power. Dynamic 
methods are far more computationally demanding for the simple reason that they perform 
regularization in a much higher-dimensional space. At the same time, prior information in the 
temporal domain can really help in stabilizing the solution of this highly ill-posed problem. 


4. Imaging methods 


Imaging methods solve the linear inverse problem defined by 
B; = GJ, +N; (14) 


where: B; € RNs is the Random Vector representing the measured data at time t; G is the 
leadfield matrix as in (13); J; € IRV points ig the Random Vector representing the neural current 
distribution at time t; N; € IR^sensors explicitely accounts for the presence of noise in the data. 
The output of an imaging method is an estimate of J, for every t; as such, it is a sequence 
of current distributions which can be visualized as a dynamic intensity map representing the 
flow of brain activity. 

From a statistical perspective, all the imaging methods we will describe can be interpreted 
as Bayesian estimation methods; this is due to the ill-posedeness of the inverse problem 
which requires rather strong regularization algorithms, ie. a priori information, to get 
stable solutions. Due to the fact that most of these methods have been first described 
in a regularization/optimization framework, the estimate they provide is always easily 
interpreted as the MAP of the posterior distribution. All methods share basically the same 
likelihood function: since noise is assumed to have a zero-mean Gaussian distribution and to 
be uncorrelated with the sources, the likelihood function is: 


nbili) = Tea OP (O -SiE Gip) (15) 


where by, j; are realizations of the data and the unknown RV, È is the noise covariance matrix 
and M = Neensove- 
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What really makes each method unique is the specific choice of the prior distribution for the 
spatio/temporal distribution of the neural current. 


4.1 Static imaging methods - Minimum Norm Estimate 

Minimum Norm Estimate (MNE) is perhaps the most popular method for solving the MEG 
inverse problem. It has been introduced almost thirty years ago and is the straightforward 
implementation of the Tichonov algorithm. As seen in Section 2, this choice corresponds to a 
Gaussian prior distribution 


my exp (allie) (16) 
V2 2y 
The explicit form of the MNE solution, i.e. the MAP of the posterior distribution under the 
given prior, is given by 
j; = (GEIG +AI) G'E 'b; (17) 


where the regularization parameter is A œ a Due to the Gaussian spatial prior, solutions 


provided by MNE tend to be rather smooth, a feature that makes MNE less suitable than 
other methods for estimating focal activity. Furthermore, another limitation of MNE is that 
the zero-mean Gaussian prior gives an implicit preference to superficial sources, which need 
lower intensity to produce the meaured field. This results in an estimation bias which is 
stronger for deeper sources. Despite these drawbacks, MNE is a very attractive method thanks 
to its simplicity and to the very low computational cost. 


4.2 Static imaging methods - weighted MNE, LORETA and LCMV beamformer 

The class of weighted minimum norm solutions has been developed with the aim of 
overcoming the depth bias introduced by the simple prior used by MNE. The main difference 
with respect to standard MNE is that the prior is still Gaussian but the source covariance 
matrix is no longer proportional to the identity matrix; instead, the source covariance matrix 
contains a weighting factor for each source, aimed at removing the bias towards superficial 
solutions. In Lin et al. (2006), for instance, the 3 points X 3N points Source covariance matrix R 
is diagonal with 


Rj; = |G") ||7?, j = 3k — 2,3k — 1,3k, k = 1,...,N points (18) 
where G(r“) is the lead field corresponding to the k-th source point, || - || is the euclidean norm 
and p is the depth weighting parameter. With this choice, deeper sources have a Gaussian 


prior with larger variance with respect to shallower sources. The final solution of WMNE is 
given by the two following equivalent formulations 


ji = RG'(GRG! +E) 'b; = 
= (GTE-IG +R!) IGITE thb; (19) 


Low Resolution Electromagnetic Tomography (LORETA) (Pascual-Marqui et al., 1994) is a 
WMNE with the further addition of a Laplacian operator. The prior for LORETA is given by 


7(j,) & exp ( -jT RL™LR}, ) (20) 


where R is as in (18) and L is the discrete spatial laplacian operator (Pascual-Marqui, 1999). 
Such operator L produces a smoother estimate of the current distribution since the inverse 
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matrix L~! implements a discrete spatial smoothing operator. LORETA has been shown to 
provide accurate localization even for deep sources. 


The term beamformer encompasses a broad range of algorithms which have become popular in 
the last fifteen years. In one way or the other, most or all of them can be proved to be particular 
choices of WMNE (Mosher et al., 2003). Here we only consider the Linearly Constrained 
Minimum Variance Beamformer (Van Veen et al., 1997). The derivation of this method is made 
in an engineering framework and is based on a constrained minimization. The idea behind 
beamforming is that of constructing a spatial filter which isolates the activity produced in a 
selected location, blocking the activity originated elsewhere. The main working assumption 
of the method described in (Van Veen et al., 1997) is that all the sources are uncorrelated. The 
result of the constrained minimization turns out to be 


it = KCER G] (G(r) (GRE? +) b(t) (21) 


where R; is the data covariance matrix and i: is the estimated current density at k-th location. 
We remark that this solution corresponds to the WMNE solution (19) if the source covariance 
matrix R is a block diagonal matrix, composed by the 3 x 3 matrices: 


RE) = (G) TR eE] 22) 


and zero elsewhere. Importantly, the assumption that the neural sources are not correlated 
in time is a strong assumption, often not verified in experimental data. As a consequence of 
this assumption, however, beamformers face severe difficulties when the data actually contain 
signals produced by correlated sources. An improved version of the beamforming technique 
has been proposed, which tries to overcome this limitation (Sekihara et al., 2002). 


4.3 Static imaging methods - dSPM and SLORETA 

Dynamic Statistical Parametric Mapping (dSPM) and Standardized LORETA (sLORETA) 
combine Bayesian inference for the electrical current with noise normalization. The ouput 
of these methods is no longer an estimate of the neural current distribution, but rather a 
statistical measure of brain activity. 


Instead of imaging the estimated electrical current intensity, SPM (Dale et al., 2000) produces 
maps of the following neural activity index 


ak 
S PE 
t= WET EwW E = 


where W(r*) is the regularized inverse operator in (19) 


W(t") = (GRG? + 2)-!G(r*)R(r*) (24) 


È is the noise covariance matrix, and || - ||p is the Frobenius norm; in practice, the output 
of a weighted MNE j, at the k-th location is divided by an estimate of the portion of the 
source variance, at the same location, which is due to the noise in the data. Under the null 
hypothesis of zero brain activity, the resulting quantity z; has a Student's t distribution. 
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sLORETA (Pascual-Marqui, 2002) consists in a similar approach, but the standardization is 
obtained considering the variance of the estimated sources (19), instead of using just the 
variance due to the noise component. SLORETA produces maps of the quantity 


akyi9 
T- HE 
T — ; 
|[(WG*))T (GRG! +2)-1W(*)||f 
The claim of the authors is that this method produces unbiased, zero-error localization of point 


sources. SLORETA is largely used in EEG source analysis, where source localization is made 
particularly difficult by the strongest impact of the passive currents on the measured data. 


(25) 


4.4 Static imaging methods - minimum current estimate 

The MCE algorithm (Uutela et al. (1999)) is substantially different from all the imaging 
methods presented so far in that it does not use a Gaussian prior for the current distribution, 
but an exponential prior instead: 


(jx) = exp(-|ljell1) (26) 
N, points | sk 
=1 {Jt l 
This is usually phrased in the regularization framework as using an L! prior, and the main 
consequence of this choice is that sparsity of the solution is promoted, so that estimates 
provided by MCE are typically highly focused. 
In contrast to the MNE, MCE is therefore well suited for estimating focal activity. As a 
drawback, it has been suggested that MCE is in fact not capable to recover smooth current 
distributions, always shrinking to few points the estimated active area. From a computational 
viewpoint, the minimization with the L1 norm cannot be expressed any more in a closed 
form. Linear programming algorithms can be used to solve the minimum problem, but the 
computational cost is remarkably higher compared to that of methods based on a Gaussian 
prior. 


3 
where [ljal = £ 


4.5 Dynamic imaging methods - L1L2 regularization 

From the conceptual point of view, the L1L2 approach (Ou et al., 2009) consists in replacing 
the time-varying Random Vectors in equation (14) with the full random matrices containing 
the whole spatio-temporal pattern of the data and of the brain activity: 


B=GJ+N (27) 
where B = |B}, ... Br], J = [Ji ---,Jr] and N = [N1,...,Nr]; the functional to be minimized 
is now 

IIB — GJ| | + IIIa (28) 
with 


3N points T 


W= X ILE (29) 
k=1 t=1 


In Bayesian terms, this can be viewed as using a Gaussian prior distribution in time and a 
Laplacian prior distribution in space. This type of regularization promotes both sparsity in 
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the spatial domain, in order to produce good estimates of focal sources, and smoothness in 
the temporal domain. From the computational viewpoint, this method is realized by means 
of a projection of the data and of the source space onto a subset of basis functions, to reduce 
the dimensionality; a gradient method is used to minimize the functional, i.e. to calculate the 
approximate MAP of the posterior distribution. 


4.6 Dynamic imaging methods - Kalman filter 

In Kalman filtering, one explicitely models the neural currents and the measurements as 
two stochastic processes, {J,,...,Jr}, {B1,...,Br}. Under Markovian assumptions for these 
processes, the posterior distribution 7t(j,|bj.;) can be computed at any time-point t applying 
sequentially the following two equations: 


7(j,\by+-1) = [Ghia Galbi) (30) 
b; lj j |b. 
ria = Teirin a 


where {b1:;} denotes the set of measurements up to time t and 7t(j;|j;—1) is the transition 
kernel wich describes the evolution of the neural currents in time. Under Gaussian noise, 
Gaussian prior assumptions, the posterior distribution remains Gaussian at any time point, 
and these two equations have closed form solution through the so-called Kalman filter: 
at every time point, the mean and the covariance matrix of the Gaussian distribution are 
updated. The main difference between this approach and all the previously described 
approaches is that here the data are analyzed in sequence. The solution at time t depends 
on all the past history of observations; the prior density in the Kalman filter is time dependent 
and is given by 


t 
m(ja) = (jo) TE Gnin): (32) 

n= 
The main practical issue in the application of a Kalman filter to the MEG inverse problem is 
that the Kalman filter requires to invert the source covariance matrix at each time step. In 
MEG, this covariance matrix is typically very large, being 3M points X 3N points: In Galka et al. 
(2004), the very high dimensional problem is decomposed in a collection of low dimensional 
problems, exploiting the local nature of the covariance matrix; in Long et al. (2006), a network 

of computers is instead used to solve the full problem. 


5. Parametric methods 


Parametric methods solve the non-linear inverse problem defined by 


Noaipoles a 
B= )) G(Ri)Q +N; (33) 
i=1 
where N dipoles is the unknown number of active dipoles at time ft, Rİ and Qi are the 
Random Vectors representing the location and the dipole moment of the i-th source at time t, 
respectively, and G(r) is the lead field of a unit dipole at location r. Remarkably, the number 
of active dipoles is not expected to be higher than, say, ten; this implies that the number of 
unknowns in the parametric approach is considerably lower than the number of unknowns in 
the imaging approach. However, as already observed, the data depend now non-linearly 
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on the parameters, namely, on the number of sources and on the source locations. This 
non-linearity causes in fact several problems if one attempts to solve the full problem. For 
this reason, several methods for dipole estimation work under two important assumptions: 
first that the number of sources is constant in time, and second that the source locations are 
fixed. Only recently, Bayesian methods have appeared which try to overcome these artificial 
constraints. The authors have personally developed one of these methods, namely particle 
filtering; to this method is devoted Section 6. In the present Section, we give a brief overview 
of the other main parametric methods. 


5.1 Static parametric methods - dipole fitting 

Dipole fitting is one of the oldest and most largely used methods for solving the MEG 
inverse problem. In dipole fitting, current dipoles are estimated one at a time, at fixed time 
points; the user typically chooses the best time point, the one at which the data exhibit 
a dipolar field pattern, and then starts a non-linear optimization algorithm - typically a 
Levemberg-Marquardt algorithm - which maximizes the likelihood for a single current dipole 
at that particular time point 


exp (—(bi — G(r) -q)"Z1(b — G(r) - q)) (34) 


rand q being the location and dipole moment of one single source. The single dipole fitting 
procedure is typically iterated, usually at different time points, until a satisfactory set of 
different dipoles is found. To produce an estimate of the temporal behaviour of the estimated 
sources, a multiple dipole model is constructed by combining the estimated dipole locations 
and - possibly - orientations; the dipole strengths are then estimated linearly from the data, by 
optimization of the likelihood 


N dipoles ; : . ; 
exp |- |b- } G(r) q E |b- Y G(r) q (35) 
i=1 I 


A key problem with dipole fitting is that it strongly relies on the personal experience of the 
person who performs the analysis; in some sense then, despite it never formally makes use of 
a prior density, it may be considered as a Bayesian method due to the strong prior embodied in 
the human experience. A second relevant problem of dipole fitting is that, involving decision 
making from the user, it is usually quite time consuming. 


5.2 Static parametric methods - MUSIC and RAP-MUSIC 

The Recursively Applied and Projected MUltiple Slgnal Classification (RAP MUSIC) 
algorithm is perhaps the most largely known method for automatic estimation of current 
dipoles from MEG data. It was first introduced in Mosher et al. (1992) and further developed 
in order to deal efficiently with correlated sources (Mosher & Leahy, 1999). 

MUSIC is based on a simplified version of the model described in equation (33), as it assumes 
that the number of dipoles and their locations are fixed in time; thus the model becomes now 


Nai poles 


By= J}, G(R')U'-S;+N;=A-S +N; (36) 
i=1 
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where A = [G(R!)U!,...,G(RN«irotes )UN«irotes] is the field generated by Naipoles dipoles 
with locations {RÝ} and orientations{U'}; SÍ is the strength of the i-th dipole and S; = 


Caer oe s] the amplitudes vector; the dipole moment is therefore Q! = U' - S. 


If one further assumes that the neural sources are not correlated, simple calculations yield for 
the data covariance matrix Rp 


R; = ATRA +02 


oisel (37) 
where Rs is the covariance matrix of the sources. Due to the assumption of uncorrelated 
sources, the source covariance matrix has full rank, and the data covariance matrix contains 
N, dipoles eigenvalues considerably larger than the others; this number represents an estimate 
of the number of dipoles. 

Furthermore the corresponding eigenvectors span the so called signal subspace ¢; and 
an estimate of the unknown parameters {r',u'} is obtained by maximizing the subspace 
correlation (Golub & Loan, 1984) between the dipole field and the signal subspace: 


subcorr(G(r)u, ds). (38) 


Once the locations and orientations are estimated, the source amplitudes are found by a least 
square approach. A recursive procedure deals with the problem of local minima. 

From a statistic view point it can be shown that the estimated parameters represent a 
Maximum Likelihood estimate of the Random Vector in the model (33) (Stoica & Nehorai, 
1989). 


5.3 Dynamic parametric methods - MCMC of current dipoles 

A more recently proposed strategy for solving the parametric MEG problem is that of using 
Bayesian inference for approximating the posterior density in the parameter space (Jun 
et al., 2005). In these studies, the authors assume a fixed but unknown number of spatially 
stationary dipolar sources; the modeling parameters are then the Random Variable of the 
number of sources N dipoles and the Random Vectors of the source locations R € ReNaipotes 


of the source time courses S € R^ XT and of the dipole orientations U € R2Naipotes | The 
authors seek an approximation of the posterior density through Bayes theorem (5): 


7(Ndipoless r,s,u|b) = 
7(b|Mdipoless r,s,u)7(ulr, Ndipoles) X 


x T(S|Nndipoles) m(r| Ndipoles ) 7 (dipoles) 7 (39) 


where the probability distributions for the source parameters are (necessarily) conditioned on 
the number of sources. Uniform distributions are used for N, dipoles, U and R, while a Gaussian 
distribution is assumed for S. In order to explore the high-dimensional posterior density, they 
resort to reversible jump Markov Chain Monte Carlo techniques after marginalization of the 
posterior density with respect to the noise covariance matrix and the source time courses. 
In Jun et al. (2006), the multi-dipole model is refined by introducing a source-dependent 
temporal activity window: in other words, each dipole is active only for a finite time interval, 
and is totally silent for the rest of the sequence, thus avoiding possible cross-talk phenomena. 
The implementation is again performed with a Reversible Jump Markov Chain Monte Carlo 
algorithm. 
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6. Highly Automated Dipole EStimation 


In this section we present the formulation of the multi-dipole particle filter we have been 
developing in collaboration with our colleagues, and which we have published as an online 
open source code under the name of HADES (Highly Automated Dipole EStimation). The 
idea behind the method is that of using the Bayesian filtering equations (43)-(44) in a 
multi-dipole setting rather than in the imaging framework. In practice, the algorithm tracks 
in time the probability distribution of the dipole parameters; the main difficulties here are 
related to the non-linear dependence of the data on the dipole parameters, and on the 
necessity of estimating the number of dipoles as well. The particle filtering approach has 
been first proposed in a methodological paper, (Somersalo et al., 2003). In a subsequent 
set of publications, with our colleagues we have investigated the use of particle filtering for 
estimation of current dipoles from experimental MEG data (Sorrentino, 2010; Sorrentino et al., 
2009), and compared particle filtering with other source estimation methods in Pascarella et al. 
(2010). Particle filtering has also been used to solve the EEG inverse problem in Antelis & 
Minguez (2010) and Mohseni et al. (2008); however, in these studies a simpler setting was 
used where the number of dipoles is fixed a priori. 

The main advantages of the approach outlined below are that (i) it is a very general approach 
to multiple dipole modeling, as it explicitely allows the number of dipoles to be time varying 
and provides dynamically a posterior probability for the number of sources; (ii) it can be used 
online in real time, at least in principle, because data are processed in sequence and not as a 
whole. 

In the following, we first describe the mechanism of the most basic particle filter; we then 
consider the practical modeling of multiple dipole with Random Finite Sets; we outline briefly 
the use of semi-analytic techniques, and eventually describe the open source software with 
Graphical User Interface for use of particle filtering. 


6.1 Particle filtering 

Bayesian filtering relies on Markovian assumptions for the data and the unkown processes: 
if J; and B; represent the neural current and the measured data at time t, respectively, these 
assumptions can be formulated in terms of conditional distributions, and amount to 


T(beljp jira Irae) = (bel jp) (40) 

(jelje—a-br_1,--- b1) = Telje) (41) 

Telji) = Gel) - (42) 

Given these premises, and assuming knowledge of the two conditional distributions at the 


right hand side of (40)-(42), the posterior distribution at time f can be obtained from the 
posterior distribution at time t — 1 with the two following equations 


re(jlbrs—a) = f Gjdo- (43) 


be lj.) (ji lbi:+1) 
(by |bi:+-1) 

in (43) a prior density at time t is obtained by convolution of the posterior density at time t — 1 

with the transition kernel; in (44), the posterior density at time t is obtained by combining the 

prior with the likelihood in the standard Bayes theorem. In addition to the two conditional 


7(j,|bi+) = a (44) 
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distributions mentioned above, the density 7t(jg|bo) = 7(jg) at time t = 0 is needed for 
initialization. 

Exact computation of these two equations is possible under rather restrictive assumptions, 
such as the ones leading to Kalman filtering and described in Section 4. For more general 
cases, such as the one considered here which involves a non-linear relationship between 
the data and the unkonwns, numerical techniques have to be used to get reasonable 
approximations of the posterior densities. Particle filters are one of the most used numerical 
techniques for solving this type of problems. They are basically a sequential Monte Carlo 
sampling technique. In their simplest form, the SIR (Sampling Importance Resampling) 
particle filter, they amount to sequential application of an importance sampling - resampling 
mechanism. 

The main steps of the SIR particle filter are: 


0 [initialization] draw Nparticies sample points {johixt... „Narts distributed according to 
7(jg|bo); the sampled points represent therefore an approximation of the initializing 
density: 


: N particles 1 : zi 
miye Do Naa ôo —Jo) : (45) 
i=1 particles 


1 [Evolution] for t > 0, let Gh 1... Npartices PE a Sample distributed according to 7(j,|b1:+); 
then exploiting the Chapman- Kolmogorov equation we may approximate the next prior 
probability density function as follows: 


7(jr44|bit) = | rlilid Gulbis diy ~ 


1 N particles zi 
=- }, mml) ; (46) 


N particles i=1 


sample the so obtained prior density, by drawing a single particle ji 44 from m(ja Ñ) for 
each i; we have now an approximation of the prior at time t + 1: 


Me , 
T(jr41|b1:) a Slj — Sea) (47) 
ao 


particles 


2 [Observation] apply the Bayes theorem, i.e. compute the relative weights of the particles: 


Weis = r(by41|ji+1) j (48) 


and then normalize the weights so that }; ae wi +1 


posterior probability density function is given a 


= 1; an approximation to the 


oe 


naibi) S L Wa ô(ja1 — lea) ; (49) 
i=1 
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3 [Resampling] resample the sample set representing the posterior density: extract a new set 


ee particles 


of particles j 41 A 
of drawing the i- ne Ss is wi 41 (see Arulampalam et al. (2002) for the details); we get 


from the old set {ji 41 , in such a way that the probability 


N particles zi 
m(ja |bi:t+1) = L — lji — Sepa): (50) 


particles 


go to Step 1. 


Remarkably, the combination of Step 1 and 2 can be viewed as an importance sampling. 


6.2 Filtering of multiple dipoles with Random Finite Sets 

In order to deal with a time-varying and unknown number of sources, in Sorrentino et al. 
(2009) we have proposed the use of Random Finite Sets as a mathematical tool to model the 
neural current. 

Random Finite Sets (Molchanov, 2005) are generalizations of Random Vectors to a non-fixed 
dimension case. As the name suggests, realizations of a Random Finite Set are sets containing 
a random number of random objects. The use of Random Finite Sets for multiple target 
tracking has been proposed in (Mahler, 2003) for the first time. 

In the RFS approach to the MEG problem, the neural current at time t is assumed to be a RFS 
of current dipoles, denoted by J; = {D.., DY }, where each Di is a Random Vector containing 
the parameters of a single current dipole and the Random Variable N; represents the number 
of sources at time t. A realization of J, will be denoted as J;. The model now amounts to: 


Ina = E( A) U Bipa (51) 
By =B(A)+W; . (52) 


In (51), the dipole set at time t + 1 is given by the union of the set E( J+), containing the subset 
of dipoles active at time t which have survived and evolved, and the set of new born dipoles 
B:41. In (52), the Random Finite Set of dipoles J, produces the data; since the dimension of 
the data vector is fixed to the number of sensors in the MEG device, the data is represented as 
a standard Random Vector. 

Despite the formulation in terms of Random Finite Sets, single dipole probabilities can be 
used to describe most of the models as long as individual dipoles are assumed to behave 
independently on each other . In the following we describe the main features of the model in 
terms of single dipole probability density functions. 


6.2.1 Initializing density 

HADES adopts an uninformative initializing density for the multi-dipole states. Let Vmay be 
the maximum number of dipoles in a RFS (which we need to fix for computational reasons); 
then the prior probability of different models is uniform, P(m9) = 1/(Vmax +1) for ng = 
0, ..., Vmax. Dipoles are assumed to be independent and identically distributed with the same 
density used for the single-dipole case; the probability density function of each dipole Dg € J 
is 


1 = 
7(dg|bg) « y ŽP (3240 ) (53) 


where we have defined separately the uniform density for the dipole location in the volume 
V and the zero-mean Gaussian density with covariance matrix & for the dipole moment. 
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6.2.2 Likelihood function 

Due to the linearity of the Biot-Savart equation, the magnetic field produced by a set of sources 
is given by the linear combination of the magnetic fields produced by the individual dipoles 
contained in the set. If we denote by b(J;) the magnetic measurement produced by the dipole 
set J; under the standard assumption of additive white Gaussian noise, with covariance 
matrix Ł, the likelihood function is 


(bilje) & exp (—[b: = bU)]7Z b: = BUF!) (64) 


6.2.3 Transition kernel 
Lack of knowledge on the true dynamics of the neural sources is coded in a transition kernel 
based on the random walk. The transition probabilities between sets with different number of 
dipoles are uniform. The evolution of a dipole set containing n dipoles at time t is governed 
by the following rules: 


e the probability that the dipole set contains n dipoles at time t + 1 is 1/3; the probability 
that it contains n + 1 dipoles is 1/3; the probability that it contains n — 1 dipoles is 1/3; 


e if the dipole set at time t + 1 contains n — 1 dipoles, the dipole to be removed is selected 
randomly from the set; 


e if the dipole set at time t + 1 contains n + 1 dipoles, the new dipole is drawn from the same 
distribution of the initializing prior, eq. (53); 


e all the surviving dipoles of a dipole set evolve independently according to the 
single-dipole transition kernel: 


m(di+1|de) ~ Tgr) (t1) x exp — (lam - q] E5 la — ail) (55) 


where Tga) (+) is the indicator function of a ball with center in r and 3; is the covariance 
matrix for the Gaussian random walk. 


6.2.4 Source estimates 


Point estimates of the dipole parameters are provided in a three-step automatic procedure. 
First, the number of sources is estimated from the marginal distribution 


P(e =H) = f g Ubin 66) 


where S(k) is the set of all subsets with k elements; the maximum of the marginal distribution 
can be used as an estimator: 
A= arg max P(| J =k). (57) 


Second, source locations are estimated as the maxima of the marginal quantity often referred 
to as Probability Hypothesis Density (PHD) in the RFS framework. The PHD is defined as 


PHD(d;) = f m({ai} UYsY (58) 


where the integral is a set integral, { d; } is a single-dipole set and 7r(-) is the probability density 
function of the Random Finite Set 7. In a particle filter implementation, the PHD can be 
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approximated by 
N particles ; 
PHD(d)= Jo w|} d(d—d)] . (59) 
i=1 deji 


The PHD can also be viewed as the intensity measure of a point process. 
Third and last step, once the dipole locations have been estimated the dipole moments are 
estimated with a linear least squares. 


6.3 Rao-Blackwellization 
The SIR particle filter described in the previous subsection is very general as it allows 
approximation of non-Gaussian densities which arise due to the non-linear dependence 
between the data and the parameters. However, in the MEG case such non-linearity is limited 
to the parameters describing the location of the sources; on the other hand, the data depend 
linearly on the dipole moment; this linearity can be in fact exploited using a technique known 
as Rao-Blackwellization. The main idea is that the posterior density can be split (Campi et al. 
(2008)) 

(xt, qelbit) = (qire b1:+)7(re|b1:) (60) 


At the right hand side, the density for q; 7r(q;|rt, b1:+) conditional on r; is a Gaussian density, 
whose parameters can be tracked with a Kalman filter; on the other hand, the locations 
parameters must be sampled, which leads to an algorithm that realizes a set of Kalman filters 
for the dipole moment, one for each sampled location. 


6.3.1 Clustering 

Due to its dynamic nature, the particle filter described above tends to provide slightly moving 
estimates of static dipoles. In order to produce a compact description of the estimated 
dipoles, it can be useful to apply a clustering algorithm to the whole set of estimated dipoles 
{dy, ae di', a ât, _ d,'}. In Sorrentino et al. (2009) we have applied an iterative algorithm 
based on the k-means (Spath, 1980) to obtain a small set of neural sources. The iterations 
are needed because the number of clusters is unknown, while the k-means works with a 
fixed number of clusters: hence, we first cluster the dipoles in a large number of clusters 
Netustersi then apply the Wilcoxon test (Weerahandi, 1995) and check whether all clusters 
are significantly diverse; if not, we set Neiysters = Neiusters — 1 and try again the clustering 
procedure; if they are all diverse, the algorithm stops. A similar idea, implemented in a more 
sophisticated strategy, has been proposed in Clark & Bell (2007). 


6.4 HADES 

The particle filter described above has been implemented in Matlab, and is 
currently distributed with a Graphical User Interface, named HADES (Highly 
Automated Dipole EStimation) (Campi et al. (2011)). The software can be 
downloaded from = http://hades.dima-unige it. The software is open source 
and is released under the standard GPL license. Through the MNE toolbox 
(http://www.nmr.mgh.harvard.edu/martinos/userInfo/data/MNE register/index.php) it 
supports input/output in Neuromag fif, FreeSurfer .w and MNE .stc formats, further than 
the standard Matlab .mat format. 
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7. Conclusion 


Localization of brain activity from MEG recordings is a challenging problem which has been 
solved in a variety of frameworks and using many different algorithms. In this Chapter 
we have attempted a short review of some of these methods, the ones which seem to 
be most used in the MEG community, with the aim of describing them in the unifying 
framework offered by statistics. Interestingly, when described in such framework some clear 
connections between many of these methods appear naturally. Indeed, it emerges clearly 
from this review that the only difference between most of these methods, except for the 
type of source model, relies in the choice of the prior density. It also emerges that the main 
difference between more recent and previously developed methods is that the recent methods 
tend to exploit the dynamic nature of the inverse problem by performing regularization 
in the whole spatio-temporal domain; in statistical terms, this corresponds to providing 
a priori information on the temporal behaviour of the sources in addition to the a priori 
information on the spatial distribution. Among these new dynamic methods, we have given 
particular emphasis to the description of HADES, the multi-dipole particle filter we have been 
developing in collaboration with our colleagues. We look forward to see how far this new 
generation of spatio-temporal solvers can reach, and what the next generation will be. 
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